BLOOMS on AgreementMaker: results for OAEI 2010

نویسندگان

  • Catia Pesquita
  • Cosmin Stroe
  • Isabel F. Cruz
  • Francisco M. Couto
چکیده

BLOOMS is an ontology matching method developed as part of an ontology extension system. It combines lexical similarity measures with similarity propagation based on semantic distance. For the participation in OAEI 2010 BLOOMS was integrated into the Agreement Maker system which has competed in previous years. Although BLOOMS was specifically designed to be as automated as possible, and thus favors precision, results were encouraging. 1 Presentation of the system BLOOMS is an ontology matching method specifically intended for application to biomedical ontologies. The matching of biomedical ontologies has become a focus of interest in recent years due to the increasingly important role that biomedical ontologies are playing in the knowledge revolution that has swept the Life Sciences domain in the last decade. 1.1 State, purpose, general statement The original purpose of BLOOMS is to provide the ontology matching component of an ontology extension system called Auxesia. Auxesia combines ontology matching and ontology learning techniques to propose new concepts and relations to bioontologies. Consequently, BLOOMS was specifically designed to match bioontologies taking into consideration some of their more relevant characteristics: bioontologies can have a large number of concepts, and usually provide a large textual component in the form of labels, synonyms and definitions; also, they typically have few types of relations defined between the concepts and little or no axiomatization. Although BLOOMS was specifically designed to be applied to bio-ontologies, it is a domain-independent strategy since it can function without external forms of knowledge. To capitalize on the specific characteristics of most bio-ontologies, BLOOMS joins a lexical matcher to exploit the rich textual component with a global similarity computation technique to handle the cases where synonyms exist but are not shared between ontologies. Furthermore, BLOOMS can also capitalize on annotation corpora, which are a feature of some biomedical ontologies initiatives. 1. Specific techniques used BLOOMS has a sequential architecture composed of three distinct matchers: Exact Match, Partial Match and Semantic Broadcast. While the first two matchers are based on lexical similarity, the final one is based on the propagation of previously calculated similarities throughout the ontology graph. Figure 1 depicts the the general structure of BLOOMS. Figure 1. Diagram of BLOOMS architecture. 1.2 .1 Lexical similarity The first two matchers used in BLOOMS use lexical similarity based on textual descriptions of ontology concepts. Textual descriptors of concepts include their labels, synonyms and definitions. Since ontology concepts usually have several textual descriptors (e.g., name, synonyms, definitions), the similarity between two ontology concepts is given by the maximum similarity between all possible combinations of descriptors. The first matcher, Exact Match, is run on textual descriptions after normalization and corresponds to a simple exact match, where the score is either 1.0 or 0.0. The second matcher, Partial Match, is applied after processing of all concept's labels, synonyms and definitions through tokenizing strings into words, removing stopwords, performing normalization of diacritics and special characters and finally stemming (Snowball). If the concepts share some of the words in their descriptors, i.e. are partial matches.,the final score is given by a Jaccard similarity, which is calculated by the number of words shared by the two concepts, over the number of words they both have. Alternatively, each word can be weighted by its evidence content. The notion of evidence content (EC) of a word [1] is based on information theory and can be considered a term relevance measure, since it measures the relevance of a word within the vocabulary of an ontology. It is calculated as the negative logarithm of the relative frequency of a word in the ontology vocabulary. The ontology vocabulary is corresponds to all words in the all descriptors of all concepts in the ontology. The final frequency of a word corresponds to the number of concepts that contain it in any of their descriptors. This means that a word that appears multiple times in the label, definition or synonyms of a concept is only counted once, preventing bias towards concepts that have many synonyms with very similar word sets. 1.2 .2 Semantic Broadcast After the lexical similarities are computed, they are used as input for a global similarity computation technique, Semantic Broadcast. This novel approach takes into account that the edges in the ontology graph do not all convey the same semantic distance between concepts. This strategy is based on the notion that concepts whose relatives are similar should also be similar. A relative of a concept is an ancestor or a descendant whose distance to the concept is smaller than a factor d. To the initial similarity between concepts, SB adds the sum of all similarities of the alignments between all relatives weighted by their semantic gap, to a maximum contribution of a factor c. The semantic gap between two matches corresponds to the inverse of the average semantic similarity between the two concepts from each ontology. Several metrics can be used to calculate the similarity between ontology concepts, in particular, measures based on information content have been shown to be successful[2]. In BLOOMS we currently implement three information content based similarity measures: Resnik[3], Lin[4] and a simple semantic difference between each concepts ICs. The information content of an ontology concept is a measure of its specificity in a given corpus. Many biomedical ontologies possess annotation corpora that are suited to this application. Semantic broadcast can also be applied iteratively, with a new run using the similarity matrix provided by the previous. 1.2.3 Alignment Extraction Alignment extraction in BLOOMS is sequential. After each matcher is run, alignments are extracted according to a predefined threshold of similarity and cardinality of matches, so that the concepts already aligned are not processed for matchers down the line. Each successive matcher has its own predefined threshold. 1.3 Adaptations made for the evaluation With the purpose of participating in OAEI, BLOOMS was integrated into the AgreementMaker system [5] due to its extensible and modular architecture. We were particularly interested in benefiting from its ontology loading and navigation capabilities, and its layered architecture that allows for serial composition since our approach combines two matching methods that need to be applied sequentially. Furthermore, we also exploited the visual interface during the optimization process of our matching strategy, since although it is not a requirement for our methods, we found it to be extremely useful, since it supports a very quick and intuitive evaluation. Since neither the mouse or the human anatomy ontologies have an annotation corpus, we had to adapt the Semantic Broadcast algorithm to use a semantic similarity measure based on edge distance and depth, so that edges further away from the root correspond to higher levels of similarity.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Testing the AgreementMaker System in the Anatomy Task of OAEI 2012

The AgreementMaker system was the leading system in the anatomy task of the Ontology Alignment Evaluation Initiative (OAEI) competition in 2011. While AgreementMaker did not compete in OAEI 2012, here we report on its performance in the 2012 anatomy task, using the same configurations of AgreementMaker submitted to OAEI 2011. Additionally, we also test AgreementMaker using an updated version of...

متن کامل

ar X iv : 1 21 2 . 16 25 v 1 [ cs . I R ] 7 D ec 2 01 2 Testing the AgreementMaker System in the Anatomy Task of OAEI 2012

The AgreementMaker system was the leading system in the anatomy task of the Ontology Alignment Evaluation Initiative (OAEI) competition in 2011. While AgreementMaker did not compete in OAEI 2012, here we report on its performance in the 2012 anatomy task, using the same configurations of AgreementMaker submitted to OAEI 2011. Additionally, we also test AgreementMaker using an updated version of...

متن کامل

Using AgreementMaker to Align Ontologies for OAEI

The AgreementMaker system is unique in that it features a powerful user interface, a flexible and extensible architecture, an integrated evaluation engine that relies on inherent quality measures, and semi-automatic and automatic methods. This paper describes the participation of AgreementMaker in the 2011 OAEI competition in four tracks: benchmarks, anatomy, conference, and instance matching. ...

متن کامل

Using AgreementMaker to align ontologies for OAEI 2010

The AgreementMaker system is unique in that it features a powerful user interface, a flexible and extensible architecture, an integrated evaluation engine that relies on inherent quality measures, and semi-automatic and automatic methods. This paper describes the participation of AgreementMaker in the 2010 OAEI competition in three tracks: benchmarks, anatomy, and conference. After its successf...

متن کامل

AML results for OAEI 2015

AgreementMakerLight (AML) is an automated ontology matching system based primarily on element-level matching and on the use of external resources as background knowledge. This paper describes its configuration for the OAEI 2015 competition and discusses its results. For this OAEI edition, we focused mainly on the Interactive Matching track due to its expansion, as handling user interactions on ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010